#ml ops

3 posts

Phase 2: the eval set must never see pre-annotations

The single decision that most human-in-the-loop projects get wrong. If your eval labels were seeded by the model's own predictions, every F1 number you ever report is biased toward the model. The fix is cheap on day one, expensive on day forty.

05/11/2026 evaluationhitlml ops

Phase 2: don't retrain on every export

After every batch of human-corrected episodes gets exported from Label Studio, the temptation is to retrain immediately. The reasons not to, and a cheap cadence trigger that actually fires when retraining will help.

05/11/2026 fine-tuninghitlml ops

Phase 2: version the slice, not the snapshot

When you fine-tune model v3, you need to be able to answer "which exported corrections went into it". Snapshotting the whole training set is the obvious answer and the wrong one. Track the inputs and the derivation; the training set is a function of them.

05/11/2026 ml opsreproducibilityhitl